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ABSTRACT 

Hemifacial microsomia (HFM) is the second most common facial anomaly after 
cleft lip and palate. The phenotype is highly variable and most cases are sporadic. 
Here, we investigated the disorder in a large pedigree with five affected 
individuals spanning eight meioses. We performed whole-exome sequencing and 
a genome-wide survey of segmental variations. Analysis of the exome sequencing 
results indicated the absence of a pathogenic coding point mutation. Inspection of 
segmental variations identified a 1.3Mb duplication of chromosome 14q22.3 in all 
affected individuals that was absent in more than 1000 chromosomes of ethnically 
matched controls. The duplication was absent in seven additional sporadic HFM 
cases, which is concordant with the known heterogeneity of the disorder. To find 
the critical gene in the duplicated region, we analyzed signatures of human 
craniofacial disease networks, mouse expression data, and predictions of dosage 
sensitivity. All of these approaches implicated OTX2 as the most likely causal 
gene. Moreover, OTX2 is a known oncogenic driver in medulloblastoma, a 
condition that was diagnosed in the proband during the course of our study. Our 
findings highlight dosage sensitivity of OTX2 in human craniofacial development 
and suggest a possible shared etiology between a subtype of hemifacial 
microsomia and medulloblastoma. 
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INTRODUCTION 

Hemifacial microsomia (HFM; also termed oculoauriculovertebral spectrum or 
Goldenhar syndrome, OMIM: 164210) is a highly heterogeneous condition with an 
estimated rate of 1 in 5,600 to 20,000 births [1]. The hallmarks of this disorder are 
marked facial asymmetry due to maxillary and mandibular hypoplasia and ear 
malformations such as preauricular skin tags, microtia, anotia, and conductive 
hearing loss. Some cases also present epibulbar dermoids and coloboma of the 
upper eyelid, cleft lip and palate, as well as cardiac, renal, and vertebral defects. 
To a lesser extent, the disorder also involves neurological anomalies and 
developmental delays or mental retardation [1-3]. 

The characteristic facial anomalies of HFM cases are attributed to disruptions in 
the first and second pharyngeal arches during days 30-45 of gestation in humans 
[1]. These arches contribute to the development of muscles of mastication, the 
maxilla, the mandible, middle ear bones, muscles of facial expression, and the 
stapedial artery. Animal models suggest embryonic hemorrhage or a deficiency in 
neural crest cell migration as the pathogenesis, disrupting normal development of 
the pharyngeal arch derived structures [4]. 

The HFM spectrum reflects a complex pathogenesis that presumably includes 
both extrinsic and genetic risk factors [2]. Several epidemiological surveys 
suggest a role for environmental factors that affect the vascular system, including 
use of vasoactive agents, hypoxia, exposure to teratogens, and gestational 
diabetes [5]. While most HFM cases are sporadic, approximately 2-10% of cases 
are familial and present in more than one generation, supporting the contribution 
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of genetic risk factors [6,7]. Careful examination of seemingly unaffected relatives 
of a large number of probands revealed familial aggregation of mild craniofacial 
malformations and preauricular skin tags [8]. These mild features are relatively 
rare in the general population but do not meet the clinical criteria for HFM, leading 
to a decreased perception of family history. Segregation analysis of 74 families 
strongly favored an autosomal dominant mode of inheritance with incomplete 
penetrance over recessive or polygenic transmission [9]. These results suggest 
that genetics plays a broad etiological role in the manifestation of the disorder. 

Genetic investigations of HFM cases have not yet clearly defined the critical 
genes involved in this disorder. Several studies have reported facial asymmetry 
and mandibular hypoplasia in cases with gross chromosomal aberrations and 
trisomies [10-15]. However, these patients exhibit multi-organ pathologies that are 
atypical of most HFM cases, suggesting that they represent distinct types of 
syndromes. Genome-wide linkage analysis of 3 HFM pedigrees revealed potential 
linkage to 14q32, 1 1q1 2-1 3 [16], and 15q26.2-q26.3 [17]. Candidate gene 
sequencing in these studies failed to find a pathogenic variation. Rooryck et al. 
[18] performed array CGH on a cohort of 86 HFM patients, most without family 
history of the disorder. They found 12 copy number variants (CNVs) ranging from 
2.7kb to 2.3Mb (median: 153Kb). However, none of these CNVs were recurrent 
and 9 out of the 10 autosomal CNVs were also present in unaffected individuals. 
The authors concluded that it is difficult to interpret to what extent these CNVs 
contribute to the disorder. To date, the field has yet to identify a strong etiological 
gene that is responsible for the pathogenesis of the disorder. 
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Here, we conducted a systematic analysis to identify an etiological variant in HFM. 
To increase the power of the investigation, we focused on a large family with 
multiple affected individuals. To the best of our knowledge, this family is the 
largest HFM kinship to date that is described in the literature. We considered both 
exonic mutations and copy number variations to further increase the probability of 
identifying the etiological locus while excluding bystander variations [19]. This 
process revealed a segmental duplication of 8 genes that segregates with the 
disorder. An unbiased HFM disease network analysis and expression profiling 
implicate 0TX2 as the pathogenic gene in the CNV. 



RESULTS 

Clinical presentation 

We identified a five generation Ashkenazi kinship that displays variable HFM 
anomalies in five individuals separated by a total of eight meiosis events (Figure 
1, Table 1). In all cases, the family denied consanguinity and the disorder appears 
to follow an autosomal dominant segregation pattern with incomplete penetrance 
and variable expressivity. 

The proband, subject V.3, was presented to the Craniofacial Department of the 
Rambam Medical Center in Israel at the age of three. She was born after normal 
pregnancy (42 weeks) and caesarian delivery. Clinical examination found right 
mandibular hypoplasia and facial asymmetry, cleft #7 according to Tessier's 
craniofacial classification system, preauricular skin tags, and grade II microtia, all 
on the right side. Deafness in the right ear was diagnosed at the age of 2 months. 
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She is of normal intelligence and no other abnormalities were noted at the time 
(Table 1). The proband underwent a combined surgical orthodontic manipulation 
using the distraction osteogenesis technique to elongate the right mandibular 
ramus. During the course of this study, at age seven, she was diagnosed with a 
medullosblastoma in the fourth ventricle. The tumor was completely resected, 
after which the child received craniospinal radiotherapy and chemotherapy [see a 
case study on her cancer treatment here: [20]]. 

The proband's mother (IV.3), grandmother (111.1) and cousin (V.2) were also 
examined at the Craniofacial Department of the Rambam Medical Center. All 
individuals exhibited milder facial asymmetry with unilateral clefts and preauricular 
skin tags without ear involvement. Examination of the proband's uncle (IV.2) did 
not reveal any facial anomalies, indicating incomplete penetrance of the disorder. 

The proband's first cousin twice removed (III. 3) was identified at a later stage of 
the study. He presented mild facial asymmetry on his left side without auricle 
involvement and reported that his grandmother (1.1) displayed similar features. 

Analysis of exonic variants shows evidence of no causal mutation 

We performed whole exome sequencing of individuals 111.1, V.2, and V.3. The 
average autosomal coverage of the targeted regions in the three samples was 
95x-105x reads per base pair. More than 96% of each exome was covered by at 
least one read (Supplementary Figure 1). Exome sequencing revealed 22,252, 
22,746, and 23,175 exonic variants in 111.1, V.2, and V.3 respectively. We 
observed transition/transversion ratios of 2.89-3.00 and homozygous to 
heterozygous mutation ratios of 0.56-0.58. In parallel, we also conducted 
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genome-wide genotyping of these three samples using the Affymetrix SNP Array 
6.0. Comparing shared variations between the two platforms showed 
concordance rates of more than 98% for non -reference loci (Supplementary 
Table 1). All of these technical indicators are consistent with the results of 
previous studies [21-23], supporting the quality of the exome sequencing data. 

We passed the exonic variations through a series of filters to find mutations that fit 
the rare familial pathology (Table 2). First, we excluded synonymous variants. 
Second, we excluded variations that appear at a frequency greater than 0.1% in 
large-scale sequencing projects such as the Exome Sequencing Project, 1000 
Genomes, and ClinSeq, as documented in dbSNP. In addition, we excluded 
variations that also appeared at least twice in the exome sequencing data of 21 
healthy Ashkenazi Jews (provided by Noam Shomron, Tel Aviv University). In the 
Supplementary Note, we show that these frequency cutoffs are very 
conservative. Third, we focused only on variants that reside in regions that are 
identical by descent (IBD) in all individuals. Variants that reside in these 
haplotypes where transmitted from 111.1 to V.2 and V.3. Shared variants outside 
these regions are from ancient coalescent events and reflect inheritance patterns 
that do not segregate with the phenotype. Using genome-wide genotype data, we 
identified 33 autosomal segments that are IBD in these three individuals, with a 
total size of 421.2Mb (14.5% of the autosome). This value is close to the 
theoretical expectation of a familial relationship of one grandmother and two 
cousins (1/4x1/2=12.5% on average). After excluding exonic variations that fall 
outside these segments, the number of plausible candidates was reduced to 84, 
90, and 72 variations in 111.1, V.2, and V.3. Finally, we retained only variations in 
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the IBD segments that appear in all three individuals (Supplementary Table 3), 
which resulted in 40 candidates (26 SNPs and 14 indels). Only 4 out of these 40 
variations were not documented in dbSNP. 

At this stage, we were able to recruit individual III. 3 to the study. We conducted 
array-based genome-wide genotyping and used the results to determine shared 
segments that are IBD in all four individuals: 111.1, III.3, V.2, and V.3. This process 
resulted in 16 segments with a total length of 59Mb (2.0% of the autosome that is 
shared between all four individuals). Again, this number is close to the theoretical 
expectation of 1/4x1/4x1/4=1.6%. Excluding variants outside these regions 
returned zero shared candidates of the 32 variants from the previous step. This 
filtering process showed that there is no single non-synonymous variant of 
relatively rare frequency in the population that segregates with the disorder. 

To further validate our findings, we performed Sanger sequencing of 37 variants 
that were identified in the exome sequencing results but excluded after the final 
IBD filtration step. Four of these variants were located in genes with biological 
activities that could relate to the disorder (DAB2, IQSEC1, KIAA1456, and 
ADAM28), such as vascularization, angiogenesis, imprinting, and neurogenesis 
[24-27]. However, Sanger sequencing of all 37 variations, including these four 
genes, showed that individual III. 3 is does not carry the variant, as expected from 
the IBD analysis (Supplementary Figure 2; Supplementary Table 4). 
Importantly, these results support the validity of the IBD filtration technique and 
provide additional evidence supporting the absence of an etiological point 
mutation in the exome. 
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Copy Number Variation Analysis Identifies a Familial Duplication of 14q22.3 

Given the absence of point mutations, we turned to copy number analysis using 
the genotype data from the genome-wide SNP array. Our analysis revealed a 1 .3 
Mb duplication of 14q22.3 (chrl 4:57,1 41 ,867-58,495,517) in all four individuals 
that segregated along all 8 meioses (Figure 2a). In general, CNVs of this length 
are rare and typically deleterious [28]. No other detected CNVs (>10kb) were 
found to segregate with the disorder. To increase the sensitivity, we repeated the 
CNV analysis and inspected only CNVs that are shared in individuals 111.1, V.2, 
V.3. We excluded individual II 1. 3 from this analysis because the array genotyping 
was performed separately and showed greater systematic noise. This process 
revealed seven CNV segments (>10Kb) in addition to the duplication of 14q22.3. 
However, all but one where also found in healthy Ashkenazi controls from 
genome-wide genotyping array data [29]. The one segment that was not present 
in the Ashkenazi controls was a -37 kb duplication of a non-coding region 
(chr3:1 87,279,1 70-1 87,31 6,070) that overlapped a known duplication found in 
healthy Asian controls in the Database of Genomic Variants (DGV: nssvl 548729). 
Moreover, we did not see any evidence of this region in the array data for III. 3. 
Thus, we concluded that the duplication of 14q22.3 is the only likely CNV that 
segregates with the disorder. 

In order to confirm the expected rarity of this duplication, we evaluated its 
frequency in the general Ashkenazi population. Analysis of the genome-wide 
genotyping array data from 942 healthy Ashkenazi chromosomes [29] returned 
two copies for this region. In addition, no duplications were found in this region in 
CNV analysis of deep whole genome sequencing data from 284 chromosomes of 

9 



Downloaded from http://biorxiv.org/on September 18, 2014 

Ashkenazi controls sequenced by Complete Genomics that are part of The 
Ashkenazi Genome Consortium (TAGC) and 1842 chromosomes from phase I of 
the 1000 Genomes Project [30]. These population-specific results support a 
familial variant that segregates with the disorder. 

To validate our results, we performed qPCR analysis of the duplicated region 
using Taqman assays (Figure 2b). Three probes targeting genes in the 
duplication (OTX2-OS1, EXOC5, and NAA30) were confirmed as CN=3 (copy 
number) in individuals IV.2, IV.3, and III. 3. We also observed duplication of OTX2- 
OS1 and NAA30 in V.3 and of NAA30 in 111.1, confirming segregation of this CNV 
along all informative meioses of the family. Assays targeting OTX2-OS1, EXOC5, 
and NAA30 returned CN=2 in all HapMap controls and OTX2-OS1 and NAA30 
were both CN=2 in 45 Ashkenazi control samples. To validate the boundaries of 
the CNV, we also targeted KTN1 and PSMA3, upstream and downstream of the 
predicted CNV. Both probes returned CN=2 in affected family members and 
HapMap controls (Figure 2b). 

In order to evaluate the presence of the duplication in additional HFM cases in 
Israel, the Craniofacial Department of Rambam Medical Center collected DNA 
from 7 families that consisted of one affected offspring and unaffected parents. 
Interrogation of 2 genes in the duplicated region (NAA30 and OTX2-OS1) by 
qPCR did not reveal any copy number changes in the seven additional HFM 
cases (Supplementary Figure 3). These findings suggest a distinct genetic 
etiology of the disorder in our family and are consistent with previous studies that 
described genetic heterogeneity [18]. However, a literature search revealed that a 
spectrum of genetic lesions in the 14q22 region have been associated with 
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various facial anomalies. Ou et al. [31] reported a complex event of a duplication 
of 11.8Mb that fully encompasses our 14q22 region and translocation to 13q21. 
Interestingly, the proband suffered a range of clinical signs resembling HFM 
including facial asymmetry, mandibular hypoplasia, and ear defects in addition to 
developmental delay, lacrimal duct stenosis and renal anomalies. Northup et al. 
[32] reported a large pericentric inversion inv(14)(p1 1 .2q22.3) in a proband with 
HFM signs, inherited from his phenotypically normal mother. Ballesta-Martinez et 
al. [33] recently published a short clinical report of a 14q22 duplication in a 
Spanish family with variable phenotypes resembling HFM. All of these add 
additional support to our findings. 

Candidate Gene Prioritization in the Duplicated Segment 

We sought to predict the etiological gene that contributes most to the phenotype 
in an unbiased manner among the eight genes (OTX2, OTX2-OS1, EXOC5, 
AP5M1, NAA30, C14orf105, SLC35F4, and C14orf37 [partial]) that reside in the 
duplicated region. 

First, we prioritized the genes in the duplicated region based on the similarity of 
their molecular signatures with known etiological genes of other facial 
malformations. We and others have successfully identified etiological genes using 
this guilt-by-association approach in previous studies of rare human disorders [34- 
36]. The basis of this technique is that similar phenotypes are caused by genes 
that reside in close biological modules, such as the same pathway, co-expression 
cluster, and shared regulatory control (Goh et al 2007). To identify a set of 
disorders similar to HFM in an unbiased manner, we used MimMiner, which ranks 
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clinical conditions in OMIM based on phenotypic resemblance [37]. The top three 
phenotypes with similar features to HFM were CHARGE syndrome (OMIM: 
214800), VACTERL association (OMIM: 314390), and Townes-Brocks syndrome 
(OMIM: 107480). In fact, HFM and TBS are both characterized by first and second 
arch defects, including ear, jaw, and kidney malformations [38]. Interestingly, a 
previous study also cited the commonalities between HFM, CHARGE, and 
VACTERL [39], adding additional support to the MimMiner prediction. We then 
compared the biological signatures of all coding genes in the duplicated region to 
CHD7, ZIC3, and SALL1 the corresponding genes of the three syndromes. To 
increase the robustness of our analysis, we tested these similarities using two 
gene prioritization tools: Endeavour [40] and ToppGene [41]. These algorithms 
utilize different biological datasets and employ distinct prioritization procedures. 
These two algorithms independently ranked OTX2 as the gene with the closest 
molecular signature to other facial anomalies (Figure 3a). 

Disease genes tend to be more highly expressed in affected tissues than in those 
that are unaffected [42,43]. In order to further support the pathogenicity of the 
duplication, we used publicly available expression array profiles of mouse 
embryonic tissue to compare the expression of the duplicated genes in affected 
versus unaffected tissues. Specifically, we analyzed expression levels in the 
pharyngeal arches at embryonic day 10.5 and in the entire head at E13.5. These 
developmental stages approximately overlap with the suggested critical periods 
for the HFM developmental perturbation in humans [1]. We contrasted these 
expression levels with the expression profiles of liver, heart, and lung (E10.5) and 
heart and urogenital epithelium (E13.5) since these tissues are rarely implicated in 
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HFM. At E10.5, the arrays contained data for Otx2, Ap5m1, Naa30, and Slc35f4. 
At E13.5, the arrays contained data for Otx2, Otx2os1, Exoc5, Ap5m1, Naa30, 
and Slc35f4. The expression profiles showed that Otx2 tends to be more highly 
expressed in the affected tissues than other duplicated genes at E10.5 and E13.5 
compared to any of the unaffected tissues (Figure 3b). 

Finally, we also evaluated the general sensitivity of the genes in the region to 
duplication. Huang et al. [44] developed a gene-level classifier that compares 
evolutionary, functional, gene-structure, and interaction patterns between 
haplosufficient and haploinsufficient genes. Interestingly, they found higher 
expression and tissue specificity of haploinsufficient genes early in development. 
Although the classifier predicts the probability of haploinsufficiency, it is also 
useful for detecting genes with increased dosage sensitivity (M. Hurles, personal 
communication, August 2013). Three of the duplicated genes were included in 
their classifier: OTX2 had the highest sensitivity score (0.9) followed by NAA30 
(0.474) and SLC35F4 (0.418) (Figure 3c). To summarize, all of our in silico 
analysis techniques suggested that duplicated OTX2 is the most likely 
pathological gene in our HFM cases. 

DISCUSSION 

We conducted a systematic study of familial HFM that implicates OTX2 dosage 
sensitivity in the disorder. OTX2 encodes a transcription factor that plays a critical 
role in craniofacial development and anterior brain morphogenesis. Loss-of- 
function studies in mice showed that null embryos fail to develop the anterior head 
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and die during embryogenesis while Otx? 1 ' mice exhibit a range of severe 
craniofacial anomalies, including micrognathia, agnathia, anophthalmia, and head 
narrowing [45]. The severity of the phenotype depends on the genetic background 
[46], consistent with the wide spectrum of phenotypes associated with loss of 
function in humans. Temporal loss of one copy of Otx2 during mouse 
embryogenesis up to E12.5 results in haploinsufficiency that leads to significantly 
low survival rates and abnormal head development, including reduction or 
absence of the forebrain, eyes, and jaw [47]. 0TX2 hemizygous deletions and 
non-synonymous point mutations have been reported in patients with severe 
ocular malformations and hypopituitarism, symptoms that are not seen in our 
pedigree [48-50]. 

The OTX2 germline duplication in our case suggests a potential link to the 
medulloblastoma of the proband. OTX2 is a known oncogenic driver of 
medulloblastoma [51]. Focal duplications and overexpression of this gene are 
prevalent in subclasses C and D of medulloblastoma [52]. Analysis of her tumor 
revealed an additional loss of heterozygosity on chromosome 17q [20] that is 
exclusively associated with subclasses C and D [52]. The potential biological link 
between OTX2 duplications in hemifacial microsomia and medulloblastoma raises 
the possibility of their comorbidity. While confirming this hypothesis will require the 
analysis of a large number of cases, we suggest clinicians be aware of the 
possibility of increased risk for medulloblastoma in HFM cases with OTX2 
duplications. 

Our study adds to the existing literature in multiple ways. First, our study 
considers the largest HFM pedigree to date, increasing the confidence of our 
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genetic analysis. Second, it is the first HFM study to combine whole exome 
sequencing analysis with the scanning of copy number variants. This approach 
increases the likelihood that the duplicated region is indeed the etiological site. 
Third, we present data from more than 1000 chromosomes of unaffected controls, 
which strongly diminishes the likelihood that the duplication is a polymorphism 
that segregates in the population. Fourth, we report an unbiased search using 
different systems biology approaches to find the most likely pathological gene in 
the region. These analyses implicated OTX2 as the most likely causal gene. Fifth, 
our findings suggest a potential shared etiology for HFM and medulloblastoma. 

Determining the causative gene for HFM can promote stratification of cases 
based on the molecular pathology, guide clinical care, offer reproductive 
alternatives to families that carry an OTX2 duplication, and facilitate definitive 
diagnosis, which is currently inadequate for HFM. Importantly, implicating OTX2 in 
this disorder can improve understanding of the basic molecular processes that 
underlie normal and pathological craniofacial development. 

MATERIALS AND METHODS 
Human Subject Research 

This study was approved by the Helsinki Committee at the Rambam Medical 
Center (Haifa, Israel), the Israeli Ministry of Health, and MIT's COUHES 
committee. 
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Coordinate System 

All alignment and genomic coordinates in this manuscript are reported according 
to hg19. All coverage values are reported after removing PCR duplicates. 

DNA Collection 

All DNA was derived from whole blood using standard procedures. 
Exome Sequencing 

Paired-end library preparation and exome enrichment were done following a 
streamlined protocol written by Blumenstiel et al. [53], using Agilent's SureSelect 
All Exon V.2 kit, which covers 98.2% of exons and splice sites, according to the 
Consensus CDS (CCDS) database [54]. Sequencing was performed at Counsyl 
(South San Francisco, USA) on a single flow cell on the lllumina HiSeq2000 with 
100 bp paired end reads (V.2 and V.3 on 3 lanes and 111.1 on 2 lanes). 

To increase the accuracy of our analysis, we processed the sequencing data with 
two distinct pipelines. First, we iteratively aligned the sequence reads with Bowtie 
[55] and with BWA [56]. Multi-mappers were excluded. Reads that failed to align 
were repeatedly trimmed by 10bp down to a minimum of 36bp and were 
processed in an additional round of alignment. The BAM files of all unique 
mappers from the different alignment rounds were merged and PCR duplicates 
were removed using SAMtools [57]. Variant calling of Bowtie-aligned reads was 
done using VarScan v2.8.8 [58] with mpileup2cns and the following options: -min- 
coverage 5 --min-freq-for-hom 0.9 --p-value 0.97 -strand-filter 1 . After alignment 
using BWA, variant calling was done using the Genome Analysis Toolkit (GATK) 
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[59], following the recommended workflow and filtering of low quality variant calls. 
In addition, we used lobSTR 1 .0.6 [60] to examine short tandem repeat variations 
in the exomes of 111.1. V.2, and V.3. We filtered for STRs genotyped in all three 
samples with at least 5x coverage in each, that fell within regions shared by all 
samples with IBD=1, and falling within annotated Refseq genes. Six loci were 
called as non-reference in all three samples. For each locus, the non-reference 
allele was found in at least one healthy control from a panel of more than 30 
healthy controls, mainly of European descent. 

Validation by Sanger Sequencing 

We used Primer3 [61] to design primers flanking candidate variants (+/-100bp 
upstream and downstream). We excluded primers that generated more than one 
in silico per product on the UCSC Genome Browser [62]. Sanger sequencing was 
done on an ABI 3730 DNA Analyzer. 

Genome-Wide Human SNP Array 6.0 

Genomic DNA was extracted from peripheral blood leukocytes using standard 
methods. We performed genotyping of subjects 111.1, III. 3, V.2, V.3 using the 
Affymetrix SNP 6.0 Array. We analyzed the 4 cases together with 471 unrelated 
Ashkenazi controls [29] (NCBI GEO GSE23636) using the Affymetrix genotyping 
console (v 4.1 .3) and Birdsuite [63] for genotype calling. 

Investigating exonic variations 

Annotation of exonic variations was done using SeattleSeq 137 [23] and minor 
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allele frequencies in dbSNP were taken from BioQ [64]. Filtering of variants was 
done using BEDTools [65] and custom Perl scripts (available upon request). 

IBD Calculations 

We used the Affymetrix genotyping console (v 4.1 .3) for genotype calling of our 4 
subjects together with 50 randomly selected individuals from the Ashkenazi 
controls (Bray 2010). Initial data analysis and selection of SNPs were carried out 
using PLINK [66]. We selected subsets of SNPs with MAF > 0.1 that are in 
approximate linkage equilibrium. This was carried out using the pairwise 
correlation method for LD pruning implemented in PLINK. We used the following 
parameters: window size = 50, step = 5, r A 2 threshold = 0.35. The pruned data 
contained 123209 SNPs. 

We used the pruned data as input to MERLIN [67] for pairwise IBD inference, with 
genetic map positions of 1 Mbp=1cM. Candidate IBD regions were selected based 
on pair-wise IBD probabilities. We marked all regions for which IBD probabilities 
for sharing an allele for all pairs of cases in the data were inferred to be higher 
than 0.5. We then extended the IBD region to include the tips of the chromosomes 
for cases when IBD=1 was detected in the first or the last SNP on the 
chromosome. 

Taqman CNV Assays 

We purchased custom Taqman probes to interrogate the CNV and flanking 
regions (probe start locations in NCBI build 37: chrl 4:2081 1565, chrl 4:56099993, 
chrl 4:57267695, chrl 4:57270923, chrl 4:572721 49, chrl 4:572771 01 , 
chrl 4:57328402, chrl 4:57476529, chrl 4:575971 48, chrl 4:5770071 5, 
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chrl 4:57868427, chrl 4:58725337, chrl 7:44203062). Reactions were carried out 
in 10ul, with 10ng genomic DNA and 10ng reference DNA (RnaseP), in 4 
replicates. Copy number was determined using the delta delta Ct method and 
CopyCaller v2.0 with HapMap samples NA06991 and NA11832 as calibrators. 
The OTX2 probes that were purchased from ABI failed to work despite repeated 
attempts. They produced non-Mendelian inheritance patterns for trios and 
reported deletions of the region in normal healthy controls. We therefore excluded 
these probes from the analysis. 

Prioritization using Biological Signatures 

Endeavour is available at: 

http://homes.esat.kuleuven.be/~bioiuser/endeavour/tool/endeavourweb.php and 
ToppGene is available at: http://toppqene.cchmc.org/prioritization.isp . In 
Endevaour, we used the following features: CisRegModule, Expression - 
SonEtAI, Expression - SuEtAI, Interaction - Bind, Interaction - BioGrid, 
Interaction - Hprd, Interaction - InNetDb, Interaction - Intact, Interaction - Mint, 
Interaction - String, Motif, Precalculated - Ouzounis, and Precalculated - 
Prospectr. In ToppGene, we used the following features: Domain, Pathway, 
Interaction, Transcription Factor Binding Site, Coexpression, Computational, 
MicroRNA, Drug, and Disease. 

Expression analysis of genes in the region 

Expression profiles were derived from the following experiments in GEO [68]: 
Pharyngeal arches E10.5: experiment GDS3803 with subjects GSM448013, 
GSM448014, GSM448015, GSM448016, and GSM448017. Urogenital epithelium 
E10.5: experiment GDS3173 with subjects GSM257875, GSM257932, and 
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GSM257933. Heart E10.5: experiment GDS627 with subjects GSM25150, 
GSM25151, GSM25152. Head E13.5: experiment GDS2874 with subjects 
GSM212558, GSM21 2560, GSM 21 2562, and GSM212564. Liver E13.5: 
experiment GDS2693 with subjects GSM1 77034, GSM1 77035, and GSM1 77036. 
Lung E13.5: experiment GSM290632 with subject GSE11539. All experiments 
were done using the Affymetrix Mouse Expression Array 430. The pharyngeal 
arches experiment reported results only from the A array and all the others 
reported both the A and B arrays. Therefore, in all E10.5 comparisons, we 
restricted the analysis only to genes that are on the A array. 

Based on experimental details in GEO or associated publications, the genetic 
background of all mice was concluded to be C57BL/6, with the exception of 
GDS3173 (E10.5 urogenital epithelium), the background of which was not 
documented. 

We downloaded the full soft file of each experiment from GEO, extracted the data 
from the relevant subjects, and normalized the expression data to range from zero 
to one for each subject. Experiments with multiple sets were averaged inside the 
same condition. Then, genes with more than one probe were averaged inside the 
same condition. Finally, we divided the expression of each gene in the affected 
tissue (pharyngeal arches and head) by expression in the control tissues (liver, 
lung, heart, and urogenital epithelium) and ranked the expression levels. 

Dosage sensitivity analysis 

Data was taken from Dataset_S1 .txt of Huang et al. [44]. 
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Zielinski_Fig1 




Figure 1: The five-generation pedigree. The family consists of five affected 
individuals spanning eight meioses. The proband (V.3) is indicated by an arrow. 
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Figure 2: The 14q22 duplicated region, (a) Raw intensity plots of the duplicated region (contained 
between the dotted lines) in the four affected individuals and 4 Askhenazi controls from 
(Bray et al. 2010). The signals represent the number of standard deviations of the probes from 
the mean value. The suspected copy number gain is marked by dotted vertical lines. The red line 
is a moving average with a window of 20 probes, (b) qPCR results of the affected family and two 
HapMap controls for genes in the duplcated region (OTX20S1 , EXC05, and NAA30) and two 
flanking genes (KTN1 and PSMA3) are consistent with the array results. 
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Figure 3: Prioritization of genes in 14q22. (a) Ranking of expression levels in pharyngeal 
arches (PA) compared to heart and urogenital epithelium (van Driel et al. 2006) in E10.5 
and expression in the head compared to liver, heart, and lung in E13.5 for genes in the 
duplicated region. Comparative expression ranked OTX2 highest in the affected tissues 
in all conditions (b) Ranking similarity of the molecular signatures of the genes in the 
duplicated region to causal genes in CHARGE, VACTERL, and Townes-Brocks using 
Endeavour and ToppGene. The average rank of both toolsis indicated in red. c) Ranking 
of dosage sensitivity predictions for 3 of the duplicated genes (Huang et al. 2010). 
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TABLES 



Clinical feature 


111.1 


1 1 1 .3 


IV.3 


V.2 


V.3 


Facial cleft 


+ 


+ 


+ 


+ 


+ 


Facial asymmetry 


+ 


+ 


+ 


+ 


+ 


Anotia/microtia 


- 


- 


- 


- 


+ 


Preauricular tags 


+ 


- 


+ 


+ 


+ 


Mandibular, maxillary hypoplasia 


+ 


+ 


+ 


+ 


+ 


Retrognathia 


- 


+ 


- 


+ 


+ 


Epibulbar dermoids 


- 


- 


- 


- 


- 


Cardiac anomalies 


- 


- 


- 


- 


- 


Renal anomalies 












Vertebral anomalies 












Medulloblastoma 










+ 



Table 1 : Clinical features of family members displaying HFM anomalies 



Filtering steps 


111.1 


V.2 


V.3 


Exonic variants 


22,252 


22,746 


23,175 


Non-synonymous 


9,552 


9,839 


10,072 


Rare variants 


560 


662 


665 


Variants in IBD segments 


84 


90 


72 


Shared variants 


40 


Shared with 1 1 1. 3 


0 



Table 2: Exome filtering steps 
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Supplemental Figure 1 




Sequence depth threshold 



Supplemental Figure 1: Distribution of exome sequencing cover- 
ages for the three datasets. 
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Supplemental Figure 2: Sanger traces of the four genes with biological activity that could be associated with HFM. 
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Supplemental Figure 3 
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Supplemental Figure 3b: qPCR results of the probands in the seven families. Both tested 
probes show copy number 2 of the critical region. NA06991 is a HapMap control. 
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Supplemental Note 

Our working hypothesis was that any point mutation that causes HFM will have a minor 
allele frequency (MAF) of less 0.1% in large sequencing projects. We based our 
hypothesis on the fact that HFM is estimated to occur at a frequency of 1 :5,000- 
1:20,000 births in the general population. Segregation analysis by Kaye et al. (1992) 
predicted that the sum of minor allele frequencies of all HFM causative genes is 1 :3000 
(after taking into account penetrance levels). The MAF of a single etiological variant is 
even smaller, since previous linkage analysis identified at least three non-overlapping 
segments. 

Moreover, the affected family is of Ashkenazi heritage. With the limited gene flow 
between the Ashkenazi population and other European populations, the causal mutation 
in our family is expected to be at even smaller frequencies in these large sequencing 
projects due to the low sampling rates of Ashkenazi Jews. To confirm this assumption, 
we compared the MAFs of more than 50 recessive mutations associated with Ashkenazi 
genetic disorders to the Exome Sequencing Project where we obtained most of the 
control chromosomes used in our analysis. These mutations are found at frequencies of 
1/25 to 1/70 in the Ashkenazi population, which is much higher than the expected 
frequency of a causative mutation of HFM. We found that the MAFs of these mutations 
were diluted by factors of more than 20x to 50x in ESP compared to the Ashkenazi 
population. Even if the causal mutation is found at a very unlikely rate of 1% in 
Ashkenazim, we expect it to be <0.05% in ESP. Thus, a 0.1% threshold is highly 
unlikely to miss the causative mutation. 

Similarly, we excluded variants that were seen at least twice in 42 unaffected Ashkenazi 
chromosomes. The probability to see a mutation with a true MAF of 0.1% in two 
individuals from this cohort is < 1 x10" 3 . Therefore, there is a very small risk of excluding 
the causative mutation using this MAF cutoff. 
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Supplementary Tables: 







Sequencing data 






Ref 


Non-ref 


data 


Ref 


0.99719561 


0.00280439 


Array 


Non-ref 


0.01714976 


0.98285024 



Supplementary Table 1: The probabilities to observe genetic variants in the 
sequencing data conditioned on the array data status and collapsed in all three 
individuals. 





lll.l 


V.2 


V.3 


# lanes (lllumina) 


2 


3 


3 


Average coverage 


95.32x 


99.24x 


105.26x 


Exome covered 


96.00% 


97.30% 


96.90% 


Exome covered > 5x 


90.40% 


91.60% 


91.30% 


Ts/Tv rate 


3 


2.89 


2.93 


Homozygous:heterozygous 


0.57 


0.58 


0.56 



Supplementary Table 2: A summary of the quality control indicators from the three 
exome sequencing datasets. 
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Supplementary Table 3: Variants shared IBD in individuals 111.1 , V.2, and V.3. 
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Supplementary Table 4: The Sanger sequencing results of the four genes with 
biological activity that could be attributed to HFM. 



